These are the first experiments where I use version 2 chemistry. Read 2 is now the barcode and UMI. Read 1 is the cDNA. This was done to avoid reading through the TSO on Ilumina read 1
This sequencing run consists of 3 experiments:
Differential expression testing. Use a spline based analysis from the
edgeR manual.
In notebook 3B I verbatim reproduce analysis Zac Moore sent me.
This object was generated in 2A_Clustering.
Subset only day 3 timepoint for simplicity and look at varying
doses.
Have a look at the important metadata.
Remove genes that are lowly expressed. 12,000 genes are kept
## Mode FALSE TRUE
## logical 11851 12401
Correct for composition biases by computing normalization factors with the trimmed mean of M-values method.
Check the MDS plot
Use a cubic regression spline on dose with 3 degrees of freedom.
splines <- ns(dge$samples$Dose_M, df = 3)
design <- model.matrix(~splines, dge$samples)
head(design)## (Intercept) splines1 splines2 splines3
## AAAGACTTCG 1 -0.03452765 0.07280511 -0.03821946
## AACGTCAGCC 1 -0.01100756 0.02317551 -0.01216612
## ACAGTAGCTC 1 -0.16611354 0.34968007 0.81643346
## AGCACGTTTA 1 -0.23455942 0.54199354 -0.28442210
## AGTGACAGCG 1 0.30020461 0.49821607 -0.15593612
## ATCAAGAACG 1 -0.16249002 0.72269905 -0.37463953
The three coefficients do not have any particular meaning. Hypothesis testing would only make sense if the three coefficients are assessed together. The advantage of using a cubic spline curve is that it provides more stable fit at the end points compared to a polynomial.
The spline curve with 3 degrees of freedom has 2 knots where cubic polynomials are splined together. In general, choosing a number of degrees of freedom to be in range of 3-5 is reasonable.
Compared to the edgeR example on timecourse analysis the biological variation is pretty low.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01261 0.01562 0.02242 0.02894 0.03809 0.06340
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9317 1.0120 1.0613 1.0480 1.0896 1.1198
In a dose response experiment, we are looking for genes that change
expression level across doses. Here, the design matrix uses 3 natural
spline basis vectors to model smooth changes over concentration, without
assuming any particular pattern to the trend.
We test for a trend by conducting F-tests on 3 df for each gene:
The topTags function lists the top set of genes with most significant dose effects.
The total number of genes with significant (5% FDR) changes at different doses can be examined with decideTests.
There are 681 differentially expressed genes across dose.
## splines3-splines2-splines1
## NotSig 11727
## Sig 674
Note that all three spline coefficients should be tested together in
this way. It is not meaningful to replace the F-tests with t-tests for
the individual coefficients, and similarly the logFC columns of the top
table do not have any interpretable meaning.
The trends should instead be interpreted by way of trend plots, as we
show now.
Finally, we visualize the fitted spline curves for the top four genes.
We start by computing the observed and fitted log-CPM values for each
gene:
top_genes <- head(results$Symbol,6)
logCPM.obs <- as.data.frame(cpm(dge, log=TRUE, prior.count=fit$prior.count))
logCPM.obs$ID <- row.names(logCPM.obs)
logCPM.fit <- as.data.frame(cpm(fit, log=TRUE))
logCPM.fit$ID <- row.names(logCPM.fit)
# Mung the cpm columns into a single tibble
lcm.obs <- pivot_longer(logCPM.obs[top_genes,], cols = -ID, names_to = "sample", values_to = "count") %>%
rename(obs_cpm = count)
lcm.fit <- pivot_longer(logCPM.fit[top_genes,], cols = -ID, names_to = "sample", values_to = "count") %>%
rename(fit_cpm = count)
lcm.obs$fit_cpm <- lcm.fit$fit_cpm
# Add back the sample metadata
lcm <- left_join(lcm.obs, results[,c("ID", "Symbol")])
lcm <- left_join(lcm, dge$samples[,c("Well_BC", "Dose_M")], by=c("sample" = "Well_BC"))Generate the plot.
It makes more sense to plot x axis in log scale
custom_labels <- c("Vehicle", "100 nM", "1 \u03bcM", "10 \u03bcM", "100 \u03bcM")
ggplot(lcm) +
geom_point(aes(y = obs_cpm, x=Dose_M), color = "black") +
geom_smooth(aes(y = fit_cpm, x=Dose_M), method = "loess", se = FALSE, color = "red") +
xlab("TMZ (M)") + ylab("logCPM") +
scale_x_continuous(trans = "log10") + annotation_logticks(sides="b") +
theme_Publication() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
facet_wrap(~ ID, scales = "free_y", labeller = labeller(ID = label_value)) +
theme(strip.text = element_text(face = "bold.italic"))Generate the plot linear x axis. The low doses are too compressed which is why the loess curve looks strange.
custom_labels <- c("Vehicle", "100 nM", "1 \u03bcM", "10 \u03bcM", "100 \u03bcM")
ggplot(lcm) +
geom_point(aes(y = obs_cpm, x=Dose_M), color = "black") +
geom_smooth(aes(y = fit_cpm, x=Dose_M), method = "loess", se = FALSE, color = "red") +
xlab("TMZ (M)") + ylab("logCPM") +
theme_Publication() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
facet_wrap(~ ID, scales = "free_y", labeller = labeller(ID = label_value)) +
facet_wrap(~ ID, scales = "free_y", labeller = labeller(ID = label_value)) +
theme(strip.text = element_text(face = "bold.italic"))## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Red Hat Enterprise Linux 9.4 (Plow)
##
## Matrix products: default
## BLAS: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRblas.so
## LAPACK: /stornext/System/data/software/rhel/9/base/tools/R/4.4.1/lib64/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Australia/Melbourne
## tzcode source: system (glibc)
##
## attached base packages:
## [1] grid splines stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] ggthemes_5.1.0 here_1.0.1
## [3] patchwork_1.3.0 scater_1.32.1
## [5] edgeR_4.2.2 limma_3.60.6
## [7] biomaRt_2.60.1 scuttle_1.14.0
## [9] lubridate_1.9.3 forcats_1.0.0
## [11] stringr_1.5.1 dplyr_1.1.4
## [13] purrr_1.0.2 readr_2.1.5
## [15] tidyr_1.3.1 tibble_3.2.1
## [17] ggplot2_3.5.1 tidyverse_2.0.0
## [19] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0
## [21] Biobase_2.64.0 GenomicRanges_1.56.2
## [23] GenomeInfoDb_1.40.1 IRanges_2.38.1
## [25] S4Vectors_0.42.1 BiocGenerics_0.50.0
## [27] MatrixGenerics_1.16.0 matrixStats_1.4.1
##
## loaded via a namespace (and not attached):
## [1] rstudioapi_0.17.0 jsonlite_1.8.9
## [3] magrittr_2.0.3 ggbeeswarm_0.7.2
## [5] farver_2.1.2 rmarkdown_2.28
## [7] zlibbioc_1.50.0 vctrs_0.6.5
## [9] memoise_2.0.1 DelayedMatrixStats_1.26.0
## [11] htmltools_0.5.8.1 S4Arrays_1.4.1
## [13] progress_1.2.3 curl_5.2.3
## [15] BiocNeighbors_1.22.0 SparseArray_1.4.8
## [17] sass_0.4.9 bslib_0.8.0
## [19] httr2_1.0.5 cachem_1.1.0
## [21] igraph_2.0.3 lifecycle_1.0.4
## [23] pkgconfig_2.0.3 rsvd_1.0.5
## [25] Matrix_1.7-0 R6_2.5.1
## [27] fastmap_1.2.0 GenomeInfoDbData_1.2.12
## [29] digest_0.6.37 colorspace_2.1-1
## [31] AnnotationDbi_1.66.0 rprojroot_2.0.4
## [33] dqrng_0.4.1 irlba_2.3.5.1
## [35] RSQLite_2.3.7 beachmat_2.20.0
## [37] labeling_0.4.3 filelock_1.0.3
## [39] fansi_1.0.6 timechange_0.3.0
## [41] mgcv_1.9-1 httr_1.4.7
## [43] abind_1.4-8 compiler_4.4.1
## [45] bit64_4.5.2 withr_3.0.1
## [47] BiocParallel_1.38.0 viridis_0.6.5
## [49] DBI_1.2.3 highr_0.11
## [51] rappdirs_0.3.3 DelayedArray_0.30.1
## [53] bluster_1.14.0 tools_4.4.1
## [55] vipor_0.4.7 beeswarm_0.4.0
## [57] glue_1.8.0 nlme_3.1-164
## [59] cluster_2.1.6 generics_0.1.3
## [61] gtable_0.3.5 tzdb_0.4.0
## [63] hms_1.1.3 metapod_1.12.0
## [65] BiocSingular_1.20.0 ScaledMatrix_1.12.0
## [67] xml2_1.3.6 utf8_1.2.4
## [69] XVector_0.44.0 ggrepel_0.9.6
## [71] pillar_1.9.0 BiocFileCache_2.12.0
## [73] lattice_0.22-6 bit_4.5.0
## [75] tidyselect_1.2.1 locfit_1.5-9.10
## [77] Biostrings_2.72.1 knitr_1.48
## [79] gridExtra_2.3 xfun_0.48
## [81] statmod_1.5.0 stringi_1.8.4
## [83] UCSC.utils_1.0.0 yaml_2.3.10
## [85] evaluate_1.0.1 codetools_0.2-20
## [87] cli_3.6.3 munsell_0.5.1
## [89] jquerylib_0.1.4 Rcpp_1.0.13
## [91] dbplyr_2.5.0 png_0.1-8
## [93] parallel_4.4.1 blob_1.2.4
## [95] prettyunits_1.2.0 scran_1.32.0
## [97] sparseMatrixStats_1.16.0 viridisLite_0.4.2
## [99] scales_1.3.0 crayon_1.5.3
## [101] rlang_1.1.4 KEGGREST_1.44.1